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Abstract 

Feature extraction is one of the fundamental problems of 
character recognition. The performance of character recog- 
nition system is depends on proper feature extraction and 
correct classifier selection. In this article, a rapid feature 
extraction method is proposed and named as Celled Projec- 
tion (CP) that compute the projection of each section formed 
through partitioning an image. The recognition performance 
of the proposed method is compared with other widely used 
feature extraction methods that are intensively studied for 
many different scripts in literature. The experiments have 
been conducted using Bangla handwritten numerals along 
with three different well known classifiers which demonstrate 
comparable results including 94.12% recognition accuracy 
using celled projection. 
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1 Introduction 

During the past half century, significant research efforts have 
been devoted to character recognition to translate human 
readable characters into machine-readable codes. For Bangla 
language, it is one of the active research areas waiting for 
accurate recognition solutions and the accuracy of the recog- 
nition solutions is predominantly depends on proper features 
extraction methods [14]. There exist many feature extrac- 
tion methods which have their own advantages or disadvan- 
tages over other methods. There are several important criteria 
of feature extraction methods required to be considered for 
higher recognition rate. Firstly, an effective feature need to 
be invariant with respect to character shape variation caused 
by various writing styles of different individuals and maxi- 
mize the separability of different character classes. It also 
needs to represents the raw image data of character through 
a reduced set of information which are most relevant for 
classification (i.e., used to distinguish the character classes) 
to increase the efficiency of classification process. Ease of 
implementation and fast extraction from raw data are also 
considered essential for commercial real time applications. 
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Finally, additional preprocessing steps such as noise filter- 
ing, binarization, smoothing, thinning reduce the practical 
efficiency of features. 

Features can be classified into two major categories, sta- 
tistical and structural features [12]. In statistical approach a 
character image is represented using a set of n features which 
can be considered as a point in n-dimensional feature space. 
The main goal of feature selection is to construct linear or 
non-linear decision boundaries in feature space that correctly 
separate the character images of different classes. Usually 
statistical approach is used to reduce the dimension of feature 
set for easy extraction and fast computation where reconstruc- 
tion of exact original image is not essential. These features 
are invariant to character deformation and writing style to 
some extent. Some of the commonly used statistical fea- 
tures for character recognition are projection histograms [8], 
crossings [1], zoning [5] and moments [2] etc. 

On the other hand, the structural features such as convex or 
concave strokes, end points, branches, junctions, connectivity 
and holes describe the geometrical and topological properties 
of character. From hierarchical perspective a character is com- 
posed of simpler components called primitives [12]. In case 
of structural pattern classification, a character is considered as 
a combination of primitives and the topological relationship 
among them. The stroke primitives such as lines and curves 
construct the structure of a character and generally extracted 
from skeleton that formed the basic character shape. Usually 
extraction of structural primitives required various computa- 
tionally expensive preprocessing including binarization and 
skeletonization which may cause shape distortions and struc- 
tural information loss, and as a result character recognition 
also requires a multilevel complex approximation matching 
model. However, structural features are more robust against 
different writing styles and distortions. 

2 Feature Extraction 

This section described some of the effective and well stud- 
ied statistical feature extraction methods from literature for 
making comparison with the proposed feature. 

2.1 Crossings 

Crossing is one of the popular statistical features for recog- 
nizing handwritten character [ 1 ] . It is defined as number of 
transition from background to foreground or foreground to 
background along a straight line though out the image. In 
other word it counts the number of stroke on a line from one 
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side to another side thought the image. In this experiments 
crossing is computed for every column and row to construct 
the feature vector of the image. Unlike other features this 
feature is not influenced by the width of strokes and can be 
computed without skeletonizing the image. 

2.2 Fourier Transforms 

The Fourier transformation is used in many different ways in 
character recognition process [9]. Transformation of charac- 
ter images the Fourier domain provides valuable information 
about character structure. The Fourier domain low frequency 
components denote basic shape and high frequency compo- 
nents denote finer details. For handwritten character recog- 
nition process basic shape structure are essential than finer 
details because finer details highly influence by the noise 
and writing style. We construct the feature vector for train- 
ing and classification using 64 lowest frequency components 
(to reduce the dimension of feature vector) discarding high 
frequency components in spectrum. It is observed that the 
differences of feature vectors among character classes are 
sometimes small because changes in time domain do not al- 
ways produce distinguishable changes on the Fourier domain. 
Thus some classifiers unable to provide higher recognition 
accuracy. 

2.3 Moments 

Moment invariants are extensively studied as a feature ex- 
traction method for image processing and pattern recognition 
fields [10, 11]. There exist different invariants of moments 
for efficient and effective extraction of features from images 
of different domains [2]. Two dimensional moments of order 
(p + q) of a gray level or binary image can be defined as 

.v y 

where p,q = 0, 1,2, . . . ,<» and the function f(x,y) provides 
pixel value of xth column and yth row of the image. The 
sums are taken over all the pixels of the image. The central 
moments with translation invariance of order (p + q) can be 
written as 

^=IE(*-^-y) , /M 
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where x — m\Q /moo an d y — moi /moo. The translation invari- 
ant central moments place the origin at the center of gravity 
of the image. In our case, scale invariant moments are not 
essential because we used normalized images for all our exper- 
iments. We construct a feature vector with fifteen translation 
invariant central moments i.e. jUqo, jUoi, jUn, ^20, M02> 
Mo3, M21, M12, M3i, Mi3. |"40> Mo4- We use up to 
fourth order central moments which is essential for our study 
because it is observed that higher order moments are sensitive 
to noise and variation of writing style. Hu [10] introduced 
rotation invariant moments. We also studied the seven Hu 



moments for our experiments but the recognition rate is poor 
in compare to other features. 

2.4 Projection Histograms 

Glauberman [8] used projection histograms in a hardware 
based OCR system in 1956. According to this feature, image 
is scan along a line from one side to another side and number 
of foreground pixel on the line is counted. Thus it is also 
known as histogram projection count and can be represented 
as Hi = YdifihS) f° r horizontal projection where f(i,j) is 
the pixel value of z'th row and j'th column of the image. Here 
the background pixel is considered as and foreground pixel 
is considered as 1 . Similarly, vertical projection histogram 
can be calculated. This feature is widely used in several 
preprocessing steps of document image segmentation where 
it is used for segmenting text lines, words and characters [7]. 
In the experiments, we calculate both horizontal and vertical 
projection histograms and combine them into a feature vector 
for training and testing. This measurement is not image size 
invariant but all the character data used for the recognition 
process have same size. The feature does not consider stroke 
width variation in handwritten characters. 

2.5 Zoning 

The commercial OCR system named Calera is developed 
based on zonal feature extraction method which is reported 
in Bokser [5]. According to his study contour extraction 
and thinning are not reliable for self-touching characters. 
To extract this feature an image is divided into some non- 
overlapping or overlapping zones (Cao [6] studied the over- 
lapping zones viewed as a fuzzy borders around the zones 
for character image). Then the number of foreground pixel 
is counted and the density is computed for each zone. Some- 
times zoning is considered with other features (e.g., contour 
direction) but in this text we limit the use of the word zoning 
only for pixel density feature because it is fast and simple 
enough to compare with other features used here. Zoning is 
relatively scaling and slant invariant. The feature vector of the 
experiments is designed to contain the densities of 4 x 4 = 16 
zones for each image. We also studied pixel densities of 
3x3=9 zones for 15 x 15 image size but the recognition 
rate is lower than that of 16 zones. 

2.6 Celled Projections 

In our proposed feature extraction method of horizontal pro- 
jections, a character image is partitioned into k regions as 
shown in Figure 1 and then the projection is taken for each 
region. For horizontal celled projection the feature vec- 
tor of rth cell (or region) of an m x n image can be writ- 
ten as P r — (p\ ,p2, ■ ■ ■ ,p m ) where pi can be formalized as 

Pi = Vy=i f{h ~ + ./') an d f(x,y) is the value of the pixel 
in xth row and yth column. Here the background pixel is 
considered as and foreground pixel is considered as 1 . The 
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feature vector of the complete image is V — Pi U P2 U • • • U Pa- 
using a similar technique vertical and diagonal (or from any 
other angle) celled projection can be formulated. 




Figure 1. An example of the celled projection. The geometric 
shapes on the figure represent Bangla numeral eight in standard 
form (on left) and handwritten distorted form (on right). It is notice- 
able that even with those distortions the celled projection of both 
character are quite similar. 

Although in the algorithm we consider that the input image 
is a binarized image, it is possible to extract the proposed fea- 
ture directly from gray scale image using a threshold which 
separates foreground pixels from background pixels. The 
arithmetic division operations of the algorithm can be re- 
placed by rearranging the steps with an additional inner loop. 
The size function with an image parameter returns the number 
of rows and columns of the image. In the algorithm, the 
allocate function reserves a vector in memory of a dimension 
provided as parameter. 

HORIZONTAL-CELLED-PRJECTION(G,fc) 

01. (m,n) 4— size(G) 

02. V <— allocate(mk) 

03. q^n/k 

04. for i <— 1 to m 

05. for 7" «— 1 to n 

06. if Gij = 1 then 

07. V i+m[u _ l)/qi <- 1 

08. j^q\j/q] 

09. return V 

Figure 2. The algorithm to compute the horizontal celled projection 
of the proposed feature. The output feature vector V is the celled 
projection of input image G divided into k sections. 

To calculate the vertical celled projection we need to mod- 
ify few steps of the above algorithm or transpose the input 
image. In compare to other feature extraction method this 
method required a small number of logical and arithmetic 
operations and only need to consider all the pixels of image 
in worst case. Each feature in p, required only one bit to 
store and thus a large number of features can be packed into a 
single machine word which is significantly reduce the storage 
requirement of a feature vector. Classification procedure can 



be also accelerated using proper techniques such as measuring 
hamming distance between machine words instead of measur- 
ing Euclidean distance between bits in character recognition 
process. The ease of implementation is clear from the algo- 
rithm which makes the proposed feature extraction method an 
attractive solution for hardware implementation. We construct 
the feature vectors using both horizontal and vertical celled 
projection of four and eight cells. The distortion for writing 
style has limited effect on this feature extraction technique. 

3 Classification 

We evaluate the performance of different feature extraction 
methods using three classifiers, A;-nearest neighbour rule 
(KNN), probabilistic neural network (PNN) and feed forward 
back propagation neural network (FBPN). 

3.1 fc-Nearest Neighbour 

The ^-nearest neighbour (KNN) is one of the well known 
classification techniques. Given an unlabelled test pattern x 
and a set of n labelled pattern {x\ ,x%, ■ ■ ■ ,x„} form the training 
set. The task of the classifier is to predict the class label of test 
pattern x from P predefined classes. The KNN classifier finds 
the k closest neighbours of x and determines the class label 
of x using majority voting. Usually KNN classifier applies 
Euclidean distance as the distance metric. Although KNN is 
one of the simplest and easy to implement classifier, it can 
provide competitive result even compare to the sophisticated 
multilevel training based classifier and it is quite clear from 
our experiments. The performance of KNN classifier depends 
on the proper choice of k and the distance metric used to 
measure the neighbours distances. In our experiments, we 
use Euclidean distance metric. 

3.2 Probabilistic Neural Network 

Probabilistic neural network (PNN) is widely used as solu- 
tion of pattern classification problem following an approach 
developed in statistic called Bayesian classification theory. 
PNN is a special form of radial basis function network used 
for classification. It uses a supervised learning model to 
learn from a training set which is a collection of instances or 
examples. Each instance has an input vector and an output 
class. The PNN architecture used in these experiments con- 
sists of two layers: radial basis layer, competitive layer. It is 
part of Matlab neural network toolbox [13] function collec- 
tion. To prepare a PNN classifier for pattern classification, 
some training is required for the estimation of probability 
density function associated with classes. Training process 
is faster for PNN than other neural network model such as 
backpropagation and it is also guaranteed to converge to an 
optimal direction as the size of the representative training set 
increases. 

According to the architecture of PNN used in these ex- 
periments, if an input is presented, the first layer computes 
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Table 1. Bangla handwritten numerals recognition results using different feature extraction methods and classifiers. The parameters denote 
additional configurations about features which are broadly described in Section 2. 



FCaLUIC 


ral alilc Lcl 


^-Nearest Neighbour 
Classifier 


Probabilistic Neural 
Network 


Feed Forward Back 
Propagation Neural 
Network 


^-Neighbours 


Spread Factor 


Hidden Layer Neuron 


3 


5 


7 


0-1 


1 -2 


2-oo 


21-30 


31-40 


41-50 


Celled 
Projections 


4 Horizontal 


92.17 


92.17 


91.97 


92.60 


91.50 


83.93 


87.97 


89.40 


89.37 


8 Horizontal 


92.43 


92.30 


92.17 


92.30 


92.27 


87.00 


85.67 


87.37 


87.30 


4 Horizontal 
& 4 Vertical 


94.10 


93.93 


93.73 


93.87 


94.12 


89.63 


91.20 


92.03 


91.93 


Crossings 


Horizontal & 
Vertical 


85.80 


85.97 


86.33 


86.40 


84.70 


76.83 


84.80 


85.07 


85.87 


Fourier 
Transforms 


64 Low 
Frequency 


71.80 


72.87 


73.30 


47.33 


71.13 


73.23 


66.73 


67.00 


67.67 


Moments 


15 Central 
Moments 


67.60 


67.37 


68.07 


10.00 


10.00 


67.57 


84.73 


85.23 


86.70 


Projection 
Histograms 


Horizontal & 
Vertical 


82.33 


82.37 


82.77 


82.10 


82.93 


83.30 


81.90 


82.47 


82.57 


Zoning 


4x4 


90.30 


90.27 


90.27 


89.80 


77.03 


76.33 


86.77 


87.63 


87.73 



distances from the input vector to the training input vectors 
and produces a vector whose elements indicate how close 
the input is to a training input. The second layer sums these 
contributions for each class of inputs to produce as its net 
output a vector of probabilities. Finally, a compete transfer 
function on the output of the second layer picks the maximum 
of these probabilities, and produces a 1 for that class and a 
for the other classes. The performance of the PNN depends 
upon the spread factor. The classifier will act as a nearest 
neighbour classifier if spread factor is near zero. As spread 
factor becomes larger the designed network will take into 
account several nearby design vectors. Some disadvantages 
of PNN including non-generalized model, large memory re- 
quirement and slow classification phase promote other neural 
network architectures in application fields. 

3.3 Backpropagation 

Different artificial neural networks such as feed forward back 
propagation neural network (FBPN) demonstrated to be use- 
ful in practical applications. Neural network develop its infor- 
mation categorization capabilities through learning process 
from examples known as training. In this training process 
the network adjust its weights and biases to perform accurate 
classification. One of the most common learning method used 
in this training process called back-propagation (BP). When 
network is presented with a set of training data the BP algo- 
rithm compute the difference between the actual output and 
desired output and feeding back the error exist in the output 
and correct the weights and biases that are responsible for the 



error. In our experiments, we consider a simple multilayer 
feed forward network with a single hidden layer to compare 
the performance of several feature extraction methods so that 
their performance are not shadowed by network performance. 

4 Experimental Results 

We have collected 12000 Bangla numeral samples from 120 
different writers [4]. Each writer were provided with grid 
sheet and asked to write Bangla numerals from to 9 in ap- 
propriate box of the grid for ten times. Writers were suggested 
to use all their writing style variations to fill the grid sheet. 
We use a portion of the total dataset for faster training and 
testing of the described features. The experiments have been 
conducted on a dataset of 6000 Bangla numeral samples for 
training and an independent dataset portion of 3000 Bangla 
numeral samples for testing to calculate the recognition per- 
formance. All input numeral images are normalized to size 
16x16 after computing their bounding rectangles. For FBPN 
the numeral samples used for validation are the 20% of the 
total number of samples used for training to avoid overfitting. 
We varied the neurons in hidden layer from 21 to 50 and 
divide the total range into three subranges and report the best 
result for each subrange in the Table 1 . The neuron number in 
the output layer is always fixed (i.e., 10 neurons). In compare 
to other features described in this text the training process of 
celled projection for four cells with FBPN classifier required 
only half time on average. The subranges for PNN are not 
equally allocated throughout the range. Since the recognition 
accuracy decreases with the increment of spread factor over 
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3.0 for most of the features (i.e., for them the minimum spread 
factor chosen for test over 3.0 provide the best results) but 
for Fourier transforms and moments provide best recogni- 
tion accuracy at 9.0 and 900.0 respectively. Since there are 
infinite real values in each subrange, we choose a number 
of values for test inside each subrange distributed uniformly 
throughout the subrange. We report the performance of KNN 
classifier for k — 3,5, and 7 in the Table 1 and all the features 
provide its highest recognition accuracy for these k values. 
Unlike celled projection the simple classifiers such as KNN 
and PNN could not provide acceptable recognition rate for 
moments feature extraction method and it also required a 
long training time for the complex FBPN classifier to get an 
acceptable recognition rate. In these experiments, the highest 
recognition rate achieved for Bangla numerals is 94.12% us- 
ing celled projection with four horizontal and vertical cells 
and PNN classifier. It also provide the highest recognition 
rate 94.10% for the simplest classifier KNN which implies 
that celled projections do not need additional supports from 
complex classifiers. Zoning and crossings also provide good 
recognition accuracy for different classifiers. 

5 Conclusion 

The main purpose of this experiment is to compare the per- 
formances of different feature extraction methods including 
the proposed method in different classifiers. The proposed 
method achieved 94.12% recognition accuracy with PNN 
which is the highest recognition accuracy in our experimental 
arrangements. Each feature described here performs outstand- 
ing in some cases and poor in other cases. Thus the aggregate 
recognition rate of these individual features and classifiers 
are not excellent but combining different techniques such as 
different number of celled projection with multiple classifier 
systems [3] could provide excellent recognition rate. 
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